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Abstract. We consider an ergodic process on finitely many states, with positive 
entropy. Our first main result asserts that the distribution function of the normalized 
waiting time for the first visit to a small (i.e., over a long block) cylinder set B 
is, for majority of such cylinders and up to epsilon, dominated by the exponential 
distribution function 1— e~'. That is, the occurrences of so understood "rare event" B 
along the time axis can appear either with gap sizes of nearly exponential distribution 
(like in the independent Bernoulli process), or they "attract" each-other. Our second 
main result states that a typical ergodic process of positive entropy has the following 
property: the distribution functions of the normalized hitting times for the majority 
of cylinders B of lengths n' converge to zero along a sequence n' whose upper density 
is 1. The occurrences of such a cylinder B "strongly attract", i.e., they appear in 
"series" of many frequent repetitions separated by huge gaps of nearly complete 
absence. 

These results, when properly and carefully interpreted, shed some new light, in 
purely statistical terms, independently from physics, on a century old (and so far 
rather avoided by serious science) common-sense phenomenon known as the law of 
series, asserting that rare events in reality, once occurred, have a mysterious tendency 
for untimely repetitions. 



Introduction 

We study the distribution functions of the hitting (and automatically also return) 
time statistics for small cylinder sets in processes on finitely symbols. We refer 
the reader to the rich literature on the subject (e.g. [A-G], [C], [C-K], [D-M], 
[H-L-V], [L] and the reference therein) for the recent developments in this field. 
Many works concentrate on determining whether a process (or a class of processes) 
has "exponential asymptotics" or not. These attempts were successful in rather 
restricted classes of processes. Our Theorem 1 (and its variant, Theorem 3) is 
the first fully general result saying something concrete about all ergodic positive 
entropy processes, from this point of view. Namely, we prove that in such processes 
any essential limit distribution function for the hitting times is majorized by the 
exponential law 1 — e~*. In particular, this excludes many behaviors proved to exist 
in zero entropy, such as the presence of an essential limit law for the return times 
concentrated away from zero. 



1991 Mathematics Subject Classification. 37A50, 37A35, 37A05, 60G10. 

Key words and phrases, stationary random process, positive entropy, return time statistic, 
hitting time statistic, repelling, attracting, limit law, the law of series, typical property. 

This paper was written during the first author's visits at CPT/ISITV in 2005 and 2006, 
supported by CNRS and ISITV. Research of the first author is supported from resources for 
science in years 2005-2008 as research project (grant MENU 1 P03A 021 29, Poland) 
^corresponding author: downar@im.pwr.wroc.pl 



2 



T. DOWNAROWICZ^ AND Y. LACROIX 



This theorem sheds a new hght on the extensively studied class of ergodic pro- 
cesses with positive entropy, where one could expect, all general properties have 
been established already long ago. It is impossible not to mention here the theo- 
rem of Omstcin and Weiss [0-W2] which relates the return times of long blocks to 
entropy. However, this theorem says nothing about the asymptotics of the distribu- 
tion of the return times, because the logarithmic limit appearing in the statement 
is insensitive to the proportions between the gap sizes. 

Our approach is slightly different from the one represented in most papers on the 
return/hitting time assymptotics, as we are not interested in computing the limit 
laws "at points" , i.e., along cylinders shrinking to a point x, where x usually belongs 
to a positive (or full) measure set. We describe the restrictions on the distributions 
valid for "majority" of long cylinders B. The passage from our approach to the 
limit laws at points is described in the last section. 

The proof of Theorem 1 is rather complicated, yet entirely contained within 
the classics of ergodic theory; it relies on basic facts on entropy for partitions and 
signia- fields, some elements of the Ornstein thc!ory (e-indcpcndcncc), the Shannon- 
McMillan-Breiman Theorem, the Ornstein- Weiss Theorem on return times, the 
Ergodic Theorem, basics of probability and calculus. 

Our Theorem 2 belongs to the category describing typical (or generic) properties. 
It states that a typical ergodic process with positive entropy (see the last paragraph 
of this section for the meaning of typicality among positive entropy processes) has 
the following property which we call strong attracting : there exists a subsequence 
of lengths {n') of upper density 1 in N, such that the distribution functions of the 
normalized hitting times for the majority of cylinders B of lengths n' are "flat" , 
i.e.. close to zero on a long interval. Recall that only not long ago ([C-K]) it 
was discovered that some mixing (but still of entropy zero) transformations admit 
nonexponential asymptotics. Our result shows that even some Bernoulli processes 
do so, which, in particular, answers in the negative a question of Zaqueu Coelho 
[C]. 

Both inequalities between the distribution function of the normalized hitting time 
for an event B and the exponential law 1 — e~* have nice and clear interpretations in 
terms of what we call attracting - the tendency of the occurrences of B to appear in 
series, and repelling - the opposite tendency, toward a more uniform distribution of 
occurrences along the time axis. To our knowledge, these interpretations have not 
been addressed or discussed in any papers in the field. In these terms, our results can 
be expressed as follows: Theorem 1 - in any positive entropy process the repelling 
of almost every sufficiently long cylinder B is at most marginal; Theorem 2 - within 
any measure-preserving system of positive entropy, if we "draw" a finite partition, 
then most likely it will generate a process, where nearly all long blocks of certain 
lengths (belonging to a large subset of N) strongly attract. 

If we extrapolate this to processes and rare events running in reality, we obtain 
an astonishing contribution to the century old discussion about the so-called law of 
series (see the next section for more details). 

Our understanding of typicality is somewhat different from the; often considered 
setup, in which the set of all measure-preserving transformations (the automor- 
phism group) on a fixed probability space is endowed with the topology of the 
weak convergence. In this setup, a typical transformation has entropy zero ([Ro]). 
Besides, the property we want to examine (strong attracting) depends on the gen- 
erating partition, so we need to allow the partition to vary. Thus, we fix a measure- 
preserving system of positive entropy and m > 2, we consider all factor-processes 
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generated by varying partitions into at most m elements, and we adopt the notion of 
typicality with respect to the usual Rokhlin metric for partitions (which is complete 
on such partitions). Here, a typical process has positive entropy, so this approach is 
reasonable for studying "typical properties of positive entropy systems" . Although 
we define typicality within a fixed system, strong attracting turns out to be typical 
inside every positive entropy system, which makes our notion of typicality for this 
property universal. 
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The common sense law of series versus our results 

A "series" is noted in the every-day life, when a random event considered ex- 
tremely rare happens more than once in a relatively short period of time. In the 
common sense, the law of series asserts that such series occur more often than they 
intuitively should, indicating the existence of an unexplained physical force or sta- 
tistical rule provoking them. For example, runs of good luck happen to gamblers, 
leading to high winnings (sc;e [Wi] for the famous case of Charles Wells) , people ex- 
perience repetitions of similar unlucky events (hence the proverb "misfortune never 
comes alone"), or notice series of strange coincidences without particular conse- 
quence, such as meeting people with the same last name on the same day, seeing 
several times the same combination of digits in unrelated situations, etc. 

An Austrian biologist dr. Paul Kammerer (1880-1926) was the first scientist to 
study this law. Although his book [Km] has attracted a lot of attention with its 
numerous suggestive examples, the scientific value of his "statistical" interpretation 
is rather questionable. Kammerer himself lost authority due to accusations of 
manipulating his (unrelated to our topic) biological experiments. 

Also some very serious scientists such as Swiss professor of philosophy Karl 
Gustav Jung (1875-1961), and a Nobel prize winner in physics, Austrian, Wolfgang 
Pauli (1900-1958), fascinated by examples of "meaningful coincidences" conjectured 
the existence of undiscovered and mysterious "attracting" forces driving objects 
that are alike, or have common features, closer together in time and space, for 
which they coined a term "synchronicity" . This includes attracting of repetitions 
of rare events in time, i.e., the law of series. Critics of synchronicity claim that all 
such "unbelievable coincidencies" and "series" occur at the rate complying with the 
statistics of pure randomness (see e.g. [Mi]). Human memory is keen to register 
them as more frequent simply because they are more distinctive. 

To be precise, let us agree that an event repeats in time by "pure chance" 
when it follows a Poisson process. In a typical realization of such a process, the 
distribution of signals along the time axis reveals a natural tendency to create 
spontaneous clusters, which can be easily taken for series, but are in fact just 
a feature of the random (unbiased) behavior. In order to say that some signal 
process obeys the law of series, one should detect in this process a tendency to 
create clusters stronger than in the Poisson process. It is possible to formally 
define such tendency without referring to the multidimensional distributions of the 
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process, only to the single distribution of the normalized waiting time V for the first 
signal. Because the waiting time for a signal in the Poisson process is exponential, 
such definition reduces to a simple inequality between the distribution function of 
V and the function 1 — e~*. This is cixactly how we define attracting (see the next 
section). The extreme form of attracting, strong attracting, as we will define it, 
takes place when the signals occur in long series of frequency much higher than the 
probability of the signal, compensated by much longer periods of nearly complete 
absence. 



repelling B B B...B B B B....B B ....B ..B ....B B.. 

unbiased B B....B..B....B B B..B B..B.B B B.. 

attracting B B..B.B..B B BB B.BB B B.. 

strong attr. B.BBBB BB.BBB..B.B.. 

Figure 1: Comparison between unbiased, repelling, attracting and strongly attracting distributions 
of occurrences of an event B along the time. 



In many processes in reality, attracting or even strong attracting is perfectly 
understandable as a result of physical dependence. For example, many events re- 
veal increased frequency of occurrences in so-called periods of propitious conditions, 
which in turn, follow a slowly changing steering process (e.g., floods following the 
climate changes). Such attracting, of course, is not the subject of the mystery be- 
hind the law of series. The challenge is to understand attracting for these events, for 
which we see no physical dependence and which are expected to have the unbiased 
behavior. 

With slight abuse of the complexity of life, our theorems can be interpreted to 
support the law of series as predominance of attracting for certain type of events. 
Reality is a realization of a huge measure-preserving system (obviously of positive 
entropy). Because we consider a single realization, we may assume ergodicity (a 
realization of a non-ergodic process belongs, almost surely, to an ergodic compo- 
nent). An "elementary rare event" whose occurrences cannot be fully predicted is 
a small cylinder set depending on a nondeterministic (i.e., also of positive entropy) 
factor-process gc^nerated by some finite partition of the phase space of this huge 
system. Then the majority of such elementary rare events reveal tendency to cre- 
ate series at least as strong as in the Poisson process (unbiased), or stronger. And 
in most cases this tendency will be in fact much stronger. Even if a real process 
is theoretically modeled by the Bernoulli process with an independent generator, 
so it is supposed to be unbiased (for example the process of coin tosses) , in reality 
the independent partition is always slightly perturbed, and then, by the typicality 
result, there will be an essential set of lengths n' such that nearly all blocks of these 
lengths strongly attract. Because by Theorem 1, blocks of other lengths cannot 
essentially repel, "in the average" , we will be dealing (against the intuition) with a 
substantial predominance of attracting for long configurations. 

Notice that the attracting is explained in purely statistical terms, without need- 
ing to understand the physical nature of the tiny dependencies in the perturbed 
generator. 

Of course, this hardly applies to gambling, because the event of, say, drawing 
a winning hand, is not a single cylinder, and it involves blocks probably too short 
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for the attracting to take effect. But tfie tfieory may apply to some rare events in 
computer sciences, genetics or in other areas. 

Rigorous definitions and statements 

We establish the notation necessary to formulate the main results. Let (P^, /i, a) 
be an ergodic process on finitely many symbols, i.e., #P < oo, cr is the standard 
left shift map and /z is an ergodic shift-invariant probability measure on V^. Most 
of the time, we will identify finite blocks with their cylinder sets, i.e., we agree that 
■P" = <T~^{'P)- Depending on the context, a block B G is attached to 

some coordinates or it represents a "word" which may appear in different places 
along the P-namcs. We will also use the probabilistic language of random variables. 
Then ij,{R € A} {A c R) wih abbreviate /x({x G : R{x) € A}). Recall, that 
if the random variable R is nonnegative and F{t) = ^{R < t} is its distribution 
function, then the expected value of R equals 1 — F{t) dt. 

For a set B of positive measure let Rb and Rb denote the random variables 
defined on B (with the conditional measure hb — j[(b)) the absolute and nor- 
malized first return time to B, respectively, i.e., 

Rniy) = rmn{i > 0, cT\y) e B), Reiy) = n{B)RB{y). 

We denote by FB{t) the distribution function of Rb- Notice that, by the Kac 
Theorem ([Kc]), the expected value oi Rb equals j;^, hence that of iJs is 1 (that 
is why we call it "normalized"). We also define 

GB{t) = f l-FB{s)ds. 
Jo 

Clearly, GB{t) < min{t, 1} and the equality holds when FB{t) = l[i,oo)) that is, 

when B occurs precisely with equal gaps (i.e., periodically); the gap size then equals 
1 

/i(B)- 

Similarly, let Vb be the random variable defined on V"^ as the hitting time statis- 
tic, i.e., the waiting time for the first visit in B (the defining formula is the same as 
for Rb, but this time it is regarded on the whole space with the measure fi). Fur- 
ther, let V B = tj{B)VB, called, by analogy, the normalized hitting time (although 
the expected value of this variable need not be equal to 1). By ergodicity, Vb and 
Vb are well defined. By an elementary consideration of the skyscraper above B, 
one easily verifies, that the distribution function Fb of satisfies, for every t>0, 
the inequalities: 

GB{t)-fi{B)<FB{t)<GB{t) 

(sec [H-L-V] for more details). Because we deal with long blocks (so that, by the 
Shannon- McMillan-Breiman Theorem, n{B) is, with high probability, very small), 
we will often replace Fb by Gb- 

The key notions of this work are defined below: 

Definition 1. We say that the visits to B attract (resp. repet) each other with 
intensity e from a distance t > 0, if 

FB{t) < 1 - e"* - e (resp. if Fs(t) > 1 - e"* -|- e). 

We abbreviate that B attracts (repels) with intensity e if its visits attract (repel) 
each other with intensity e from some distance t. 
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Definition 2. We say that a process has unbiased behavior if there exist collections 
Bn C satisfying /i(U S„) 1, such that Fb„ (t) 1 — e~* pointwise as n — > oo, 
for any sequence of blocks Bn & Bn- 

Definition 3. We say that a process reveals strong attracting, if there is a subset 

N' C N of upper density 1, and collections Bn' G V" for n' E N', satisfying 
M(U'^n') 1; such that FB^i{t) pointwise as n' ^ oo, for any sequence of 
blocks Bn' e Bn'- 

Let us explain why we use the terms "attracting" and "repelling" . We will com- 
pare ('P^j/i, (t) with an independent Bernoulli process which is unbiased, i.e., for 
any long block B, Fsit) « 1 — e~* (and also FB{t) « 1 — e~*) with high uniform 
accuracy (much better than e). Fix some t > Q- Consider the random variable 
/ counting the number of occurrences of B in the time period [0, ^*b) \' ex- 
pected value of / equals [ ^(b) J ~ ^ (u^P to the ignorable error IJ.(B)). On 
the other hand, /Lt{/ > 0} = ij,{Vb < j;^} = FB{t)- The ratio -p^^ represents 
the conditional expected value of I on the set {/ > 0}, i.e., the expected number 
of occurrences of B in all intervals with at least one occurrence. Attracting from 
the distance t means that FB{t) is smaller (by e) in {V^,iJ.,a) than in an indepen- 
dent Bernoulli process, i.e., that the above conditional expected value is larger in 
(■p^, /i, (t) than in the independent process. This fact can be further expressed as 
follows: If we observe the process {P^,iJ,,a) for time (which is our "memory 
length" or "lifetime of the observer") and we happen to see the event B during 
this time at least oncx;. thcin the expected number of times we will observe the 
event B is larger than the analogous value for a cylinder of the same measure in 
the independent Bernoulli process. The first occurrence of B "attracts" its further 
repetitions. The interpretation of repelling is symmetric. 

Obviously, occurrences of an event may simultaneously repel from one distance 
and attract from another. Notice, that the maximal intensity of repelling is 
achieved at t = 1 when B appears periodically (this implies repelling from all 
distances). The intensity of attracting can be arbitrarily close to 1, which happens 
when FB{t) (hence also GB{t)) remains near zero for some large t (in particular 
this implies attracting from nearly all distances, except very small and very large 
ones, where marginal repelling can occur). It is easy to see that such case happens 
exactly when the distribution of the normalized return time is nearly concentrated 
at zero, i.e., when most points in the set B return after a time considerably smaller 
than -j;^- Because the expected value of the return time equals ^j^, there must 
be a small portion of B with extremely large values of the return time. In such case 
the event B appears in long series of high frequency, compensated by huge gaps of 
nearly complete absence. This is the essence of our notion of strong attracting. 

The first main result follows: 

Theorem 1. // ("P^, /x, a) is ergodic and has positive entropy, then Jar every e > 
the measure of the union of all ri-blocks B e which repel with intensity e, 

converges to zero as n grows to infinity. 

Obviously, Theorem 1 does not exclude the unbiased behavior. For example, a 
Bernoulli process with the independent generator is unbiased. In fact, it follows 
from the results of [A-G], [H-S-V], that any process with a sufficient rate of mixing 
is unbiased (unbiased behavior is implied by "exponential asymptotics" ) . Never- 
theless, our second theorem will say in particular, that processes with the unbiased 
behavior are extremely exceptional among positive entropy processes. 
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Let (X, /x) be a standard probability space, and let m denote either a finite 
integer or the countable cardinal Hp. The Rokhlin metric endows the collection of 
all measurable /z-distinguishable partitions V of X into at most m elements with a 
topology of a Polish space. 

Theorem 2. Let {X,n,T) be an ergodic measure-preserving transformation of a 
standard probability space, with positive entropy. Fix some 2 < m < Hq. Then, in 
the Polish space of all measurable partitions V of X into at most m elements, there 
is a dense Gs subset such that every partition in this subset generates a process 

which reveals strong attracting. 

Because partitions generating positive entropy form a dense open set (see Fact 
5 below), we obtain that in a positive entropy measure preserving system a typical 
partition has both positive entropy and strong attracting. 

More notation and preliminary facts 

We now establish further notation and preliminaries needed in the proofs. If 
A C Z then we will write to denote the partition or sigma-field VieA"'^ '(^)- 
We will abbreviate -P" ^ -plo.")^ -p-" ^ -ph",-!]^ p- = p(-oo,-i] "finite future" , 
a "finite past" , and the "full past" of the process) . 

We assume familiarity of the reader with the basics of entropy for finite partitions 
and sigma-fields in a standard probability space. Our notation is compatible with 
[P] and we refer the reader to this book, as well as to [Sh] and [Wa], for background 
and proofs. In particular, we will be using the following: 

* The entropy of a partition equals H{V) — — J2AeP ^(^) ^^&2{l^i^))- 

* For two finite partitions V and B, the conditional entropy H{V\B) is equal 
to J2BeB l^i^)^B{'P), where Hb is the entropy evaluated for the conditional 
measure /xb on B. 

* The same formula holds for conditional entropy given a sub-sigma-field C, i.e., 

Y,l^{B)HB{V\C) = H{r\BvC). 

* The entropy of the process is given by any one of the formulas below 

h = H{V\V-) = lH{V-\V-) = lim lH{V-). 

W(2 will exploit the notion of e-independence for partitions and sigma-fields. The 
definition below is an adaptation from [Sh], where it concerns finite partitions only. 
See also [Sm] for treatment of countable partitions. Because "e" is reserved for the 
intensity of repelling, we will speak about /3-independence. 

Definition 4. Fix /? > 0. A partition V is said to be f3 -independent of a sigma-field 
B if for any fi-measurable countable partition B' holds 

J2 HAnB)-^,{AMB)\<p. 

Aev,BeB' 

A process (V^,ii,a) is called a (3 -independent process if V is /3-independent of the 
past V~ . 

A partition V is independent of another partition or a sigma-field B if and only 
if mj^\B) = H{V). The following approximate version of this fact holds (see 
[Sh, Lemma 7.3] for finite partitions, from which the case of a sigma-field is easily 
derived). 
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Fact 1. A partition V is f3 -independent of another partition or a sigma-field B if 
H{V\B) > H{V) - for £, sufficiently small. □ 

In course of the proof, a certain lengthy condition will be in frequent use. Let 
us introduce an abbreviation: 

Definition 5. Given a partition P of a space with a probability measure and 
5 > 0, we will say that a property ^{A) holds for A gV with ii-tolerance 5 if 

^i (|J{^ e V : >i-6. 

We shall also need an elementary estimate, whose proof is an easy exercise. 

Fact 2. For each A&V, H{V) < (1 - li{A)) \og^{#V) + 1. □ 

In addition to the random variables of the absolute and normalized return times 
Rb and Rb, we will also use the analogous notions of the k^^ absolute return time 

R^^^ = mm{i :#{0<j<i: a^{y) G B} = k}, 

and of the normalized fc*^ return time i?^^ = n{B)R^^^ (both defined on B), with 
Fg always denoting the distribution function of the latter. Clearly, the expected 
value of Rg^ equals k. 

The idea of the proof and the basic lemma 

Before we pass to the formal proof of Theorem 1, we would like to have the 
reader oriented in the mainframe of the idea behind it. We intend to estimate 
(from above, by 1 — e~* + e) the function Gba (replacing Fba), for long blocks of 
the form BA e •p[~"''"). The "positive" part A has a fixed length r, while we allow 
the "negative" part B to be arbitrarily long. There are two key ingredients leading 
to the estimation. The first one, contained in Lemma 3, is the observation that for 
a fixed typical B e V~"', the part of the process induced on B (with the conditional 
measure /z_b) generated by the partition 'P'", is not only a /?- independent process, 
but it is also /3-independent of many returns times r'"^ of the cylinder B (see the 
Figure 2). 



coordinate , 



i 



Figure 2: The process . . . A—1A0A1A2 ■ ■ ■ of r -blocks following the copies of B is a -independent 
process with additional independence properties of the positioning of the copies of B. 

This allows us to decompose (with high accuracy) the distribution function Fba of 
the normalized return time of BA as follows: 



FBA{t) = HBaIRbA < = fJ-BA{RBA < 



Y.^'BA{R^A^ 



k, R^B^ < 



Pti{B) 



} 



7(bK)S - 

k}.HB{Rf <l} 



k>l 



k>l 



p{l-p) 



k>l 

fc-1 



^B Ipj' 
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where i?^ denotes the first (absolute) return time of A in the process induced on 
B, and p = i^b{A.). 

The second key observation is, assuming for simplicity full independence, that 
when trying to model some repelling for the blocks BA, we ascertain that it is 
largest, when the occurrences of B are purely periodic. Any deviation from period- 
icity of the B's may only lead to increasing the intensity of attracting between the 
copies of BA, never that of repelling. We will explain this phenomenon more for- 
mally in a moment. Now, if B does appear periodically, then the normalized return 
time of BA is governed by the same geometric distribution as the normalized return 
time of A in the independent process induced on B. If p is small, this geometric 
distribution function becomes nearly the unbiased exponential law 1 — e~*. The 
smallness of p is a priori regulated by the choice of the parameter r (Lemma 1). 

The phenomena that, assuming full independence, the repelUng of BA is maxi- 
mized by periodic occurrences of B, and that even then there is nearly no repelling, 
is captured by the following elementary lemma, which will be also useful later, near 
the end of the rigorous proof. 

Lemma 0. Fix some p € (0,1). Let F^''^ (k> 1) be a sequence of distribution 
functions on [0, oo) such that the expected value of the distribution associated to 
F^''^ equals k. Define 

F{t)=Y,p{l-p)''-'F^''\^), and G{t) = f\ - F{s)ds. 

k>i •'° 

Then G{t) < ^^{1 - e"*), where = (1 - p)"? . 
Proof. We have 

G{t) = j2pi^-P)''' f I - F^''\^)ds. 

k>l 

We know that F^''\t) e [0, 1] and that 1 - F^''\s)ds = k (the expected value). 
With such constraints, it is the indicator function l[k,co) that maximizes the inte- 
grals from to t simultaneously for every t (because the "mass" k above the graph 
is, for such choice of the function F^''\ swept maximally to the left). The rest 
follows by direct calculations: 



G{t)<Y,p{l-pf~' l[oMI;)ds= Pi^'P)'~'ds 

k>i •'° fe=rsi 

Jo 



r = i , (1 -P)^ - 1 ^ 
^^pUs<- — -. □ 



log(l -p)p 



Recall that the maximizing distribution functions Fg = l[k^oo) occur, for the 
normalized return time of a set B, precisely when B is visited periodically. This 
explains our former statement on this subject. 

Let us comment a bit more on the first key ingredient, the /3-independence. Es- 
tablishing it is the most complicated part of the argument. The idea is to prove 
conditional (given a "finite past" V~"') /3-independence of the "present" from 
jointly the full past and a large part of the future, responsible for the return times 
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of majority of the blocks B € 7-*"". But the future part must not be too large. 
Let us mention the existence of "bilaterally deterministic" processes with positive 
entropy (first discovered by Gurevic [G], see also [0-Wl]), in which the sigma- fields 
generated by the coordinates (—00,— to] U [to, 00) do not decrease with to to the 
Pinsker factor; they arc all equal to the entire sigma-field. (Coincidently, our Ex- 
ample 1 has precisely this property; see the Remark 2.) Thus, in order to maintain 
any trace of independence of the "present" from our sigma-field already containing 
the entire past, its part in the future must be selected with an extreme care. Let us 
also remark that an attempt to save on the future sigma- fields by adjusting them 
individually to each block Bq g P"" falls short, mainly because of the "off diagonal 
effect"; suppose is conditionally (given P"") nearly independent of a sigma-field 
which determines the return times of only one selected block Bq g P"". The in- 
dependence still holds conditionally given any cylinder B € P"" from a collection 
of a large measure, but unfortunately, this collection can always miss the selected 
cylinder Bq. In Lemmas 2 and 3, we succeed in finding a sigma-field (containing the 
full past and a part of the futiu'c), of which 7"" is conditionally /3-indcpcndcnt, and 
which "nearly determines", for majority of blocks B G V~", some finite number 
of their sequential return times (probably not all of them). This finite number is 
sufficient to allow the described earlier decomposition of the distribution function 
Fba. 

The proof of Theorem 1 

Throughout the sequel we assume ergodicity and that the entropy h of (V^, f-i, a) 
is positive. We begin our computations with an auxiliary lemma allowing us to 
assume (by replacing V by some V^) that the elements of the "present" partition 
are small, relatively in most of i? G and for every n. Note that the Shannon- 
McMillan-Breiman Theorem is insufficient: for the conditional measure the error 
term in that theorem depends increasingly on n, which we do not fix. 

Lemma 1. For each 5 there exists an r G N such that for every n S N the following 
holds for B G P~" with fj,-tolerance S: 

for every A G V^, ij-b{A) < 6. 
Proof. Let a be so small that 

\/a < and — > 1 , 

h + a - 2' 

and set 7 = "^-p) ■ Let r be so big that 

1 1 5 

r r(n + a) 2 

and that there exists a collection 7-"" of no more than 2''(''+") - 1 elements of V 
whose joint measure /j. exceeds 1 — 7 (by the Shannon-McMillan-Breiman Theorem). 

Let denote the partition into the elements of and the complement of 
their union, and let TZ be the partition into the remaining elements of and the 
complement of their union, so that 'P'" = 7"" V TZ. For any n we have 

rh = H{V\p-) < H{V\p-"-) = HiV"- V 7^|p-") = 

F(P^|7^ V p-") + H{n\p-'^) < H{P'-\P-'^) + H{TZ) < 



Bev- 



ti{B)HB{Pn+ir\og^{^P) + l 
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(we have used Fact 2 for the last passage). After dividing by r, we obtain 

^ ii{B)^-HB{V^n >h- 7log2(#P) -l>h-2a. 

Because each term ^Hb{'P''') is not larger than i log2(#P'') which was set to be at 
most h + a, we deduce that 

holds for B G P^" with /i-tolcrancc hence also with /x-tolerance 5. On the 
other hand, by Fact 2, for any B and A G P'', holds: 

ffB(^)< (l-Ms(A))log2(#P^) + l< (I-Mb(^)M^ + ") + !• 

Combining the last two displayed inequalities we establish that, with /^-tolerance 5 
for B e P"" and then for every A&V^, holds 

1 - Mb(^) > Ti 7 > 1-5. 

^ - h + a r{h + a) ~ 

So, /Ub(j4) < 5. Because refines V^, the elements of are also not larger 
than 5. □ 

We continue the proof with a lemma which can be deduced from [Ru, Lemma 3] . 
We provide a direct proof. For a > and M e N let 

S{M, a)= \J [mM + aM, (m + 1)M - aM) n Z. 

mez 

Lemma 2. For fixed a and r there exists Mq such that for every M > Mq holds, 

H{V\V- V >rh-a 

(see the Figure 3). 

Figure 3. The circles indicate the coordinates through r — I, the conditioning sigma-filed is over 

the coordinates marked by stars, which includes the entire past and part of the future with gaps 
of size 2aM repeated periodically with period M (the first gap is half the size). 

Proof. First assume that r = 1. Denote also 

S'{M, a)= [j [mM + aM, (m + 1)M) n Z. 

mGZ 

Let M be so large that < (1 - a)M{h + 7), where 7 = 2(T^- Then, 

for any m > 1, 



^(pS'(M,a)n[0,mM)|p--) < jj^j,S'{M,a)n[0,mM)^ ^ _ ^J^M{h + 7). 



12 



T. DOWNAROWICZ^ AND Y. LACROIX 



Because H{V^^'"^^^ \V ) = rnMh, the complementary part of entropy must exceed 
mMh — (1 — a)mM{h + 7) (which equals amM{h — f )), i.e., we have 

jj^plO,mM)\S'{M,a)^p- y pS'(M,a)n[0,mM)^ ^ amM{h- |). 

Breaking the last entropy term as a sum over j G [0, mM) \ S'{M, a) of the con- 
ditional entropies of (t~^{V) given the sigma- field over all coordinates left oi j and 
all coordinates from S'{M, a) (1 [0, niM) right of j, and because every such term is 
at most h, we deduce that more than half of these terms reach or exceed h — a. 
So, a term not smaller than h — a occurs for a j within one of the gaps in the left 
half of [0, mM). Shifting by j, we obtain H{P\p- V cr*(pS'(M,a)n[o,^))) >h-a, 
where i G [0, aM) denotes the relative position of j in the gap. As we increase m, 
one value i will repeat in this role along a subsequence m'. The operation V is con- 
tinuous for increasing sequences of sigma-fields, hence V a^{'P^ (M,a)r\[n,:!^^^)s^ 
converges over m' to P~ V a^iV^ xhe entropy is continuous for such pas- 

sage, hence H{V\V~ V a^{P^ (■'^.a)^ > /j _ q,_ xhe assertion now follows because 
S{M, a) is contained in S'{M, a) shifted to the left by any i E [0, aM). 

Finally, if r > 1, we can simply argue for 7-"" replacing V. This will impose 
that Mo and M are divisible by r, but it is not hard to see that for large M the 
argument works without divisibility at a cost of a slight adjustment of a. □ 

For a long block B E 7^~" let {{Vg)^, fJ,B,crB) denote the process induced on B 
generated by the restriction of to B {as is the first return time map on B). 
The following lemma is the crucial item in our argument. 

Lemma 3. For every /3 > 0, r € N and K E N there exists no such that for every 
n > no, with ji-tolerance (3 for B E P^", with respect to ^b, is (3 -independent 
of jointly the past "P" and the first K return times to B, R^^'' (k E [1,K]). In 
particular, (('P^)^,/^^,^^) is a f3 -independent process. 

Proof. We choose ^ according to Fact 1, so that ^-independence is implied. Let a 
satisfy 

Let no be so large that H{P^\P~'^) < rh + a for every n > no and that for every 
k E [1, K] with /i-tolerance a for B e P"" holds 

/xb{2"(''-") < R^^'^ < 2"('*+")} > 1 - a 

(we arc using Ornstein- Weiss Theorem [0-W2]; the multiplication by k, which 
should appear for the A;*'' return time, is consumed by a in the exponent). Let 
Mo > 2"°(^~"^ be so large that the assertion of Lemma 2 holds for a, r and Mo, 
and that for every M > Mo, 

(M + 1)^+^ < aM' and < a. 

We can now redefine (enlarge) no and Mq so that Mo = [2"'°(''~")j . Similarly, for 
each n > no we set M„ = [2"(''~")J. Observe, that the interval where the first 
K returns of most n-blocks B may occur (up to probability a), is contained in 
[M„,aM2] (because 2"^^+") < (M„ + 1)^+-^ < aM^). 

At this point we fix some n > no. The idea is to carefully select an M between 
Mn and 2M„ (hence not smaller than Mo), such that the initial K returns of nearly 
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every n-block happen most likely inside (with all its n symbols) the set S{M,a), 
so that they are "controlled" by the sigma- field ■p^(^'^'°''> , Let a' = a + jj--, so that 
every n-block overlapping with S{M, a') is completely covered by S{M, a). By the 
second assumption on M > Mq and by the formula connecting M„ and n, we have 
a' < 2a. To define M we will invoke the triple Fubini Theorem. Fix k € [1, K] and 
consider the probability space 

P-" X [M„,2M„] X N 

equipped with the (discrete) measure M whose marginal on V^^ x [M„,2M„] is 
the product of fx (more precisely, of its projection onto 7^~") with the uniform 
distribution on the integers in [M„,2M„], while, for fixed B and M, the measure 
on the corresponding N-section is the distribution of the random variable R^^^ . In 
this space let S be the set whose N-scction for a fixed M (and any fixed B) is 
the set S{M,a'). We claim that for every I G [Mn,aM^] ("iN (and any fixed B) 
the [M„, 2M„]-section of S has measure exceeding 1 — 16a. This is quite obvious 
(even for every I €E [M„,oo) and with 1 — 15a) if [M„,2M„] is equipped with the 
normalized Lebesgue measure (see the Figure 4). 




Figii/re 4^ The complement of S splits into thin skew strips shown in the picture. The normalized 
Lebesgue measure of any vertical section of the j**" strip (starting at jMn with j > 1) is at most 
j^-a''^ — ^7" — ' ^'^'^^ vertical line at I > M„ intersects strips with indices j,j + + 2 up 
to at most 2j (for some j), so the joint measure of the complement of the section of S does not 
exceed 15a. 



s 

/ \ \ 

^.__._.„„..„..._..._..._..._^ 

M„ 2M„ 

Figure 5: The discretization replaces the Lebesgue measure by the uniform measure on M„ in- 
tegers, thus the measure of any interval can deviate from its Lebesgue measure by at most . 
For I < ctM^ the corresponding section of S (in this picture drawn horizontally) consists of at 
most aM„ intervals, so its measure can deviate by no more than a. 

In the discrete case, however, a priori it might happen that the integers along 
some [M„, 2M„]-section often "miss" the section of S leading to a decreased measure 
value. (For example, it is easy to see that for I = (2M„)! the measure of the section 
of S is zero.) But because we restrict to I < aM'^, the discretization does not affect 
the measure of the section of S by more than a, and the estimate with 1 — 16a 
holds (see the Figure 5 above). 
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Taking into account all other inaccuracies (the smaller than a part of S outside 
[M„, aM^] and the smaller than a part of S projecting onto blocks B which do not 
obey the Ornstein- Weiss return time estimate) it is safe to claim that 

M{S) > 1- 18a. 

This implies that for every M from a set of measure at least 1 — 18^/a the measure 
of the X N)-section of S is larger than or equal to 1 — y/a. For every such 

M, with /x-tolerance for B G "P"", the probability ij.b that the k^^ repetition 
of B falls in S{M, a') (hence with all its n terms inside the set S{M, a)) is at least 
1- 

Because ISK^/a < 1, there exists at least one M for which the above holds for 
every fc G [Ij ^] • This is our final choice of M which from now on remains fixed. For 
this M, and for cylinders B chosen with ^-tolerance K-^, each of the considered K 
returns of B with probability 1 — ^/a falls (with all its coordinates) inside S{M, a). 
Thus, for such a B, with probability 1 — the same holds simultaneously for 

all K return times. In other words, there is a set Ub of measure not exceeding 
K^/a outside of which i?^^ = Rb \ where -R^^ is defined as the time of the fc**^ 
fully visible inside S{M, a) return of B. Notice that R^^^ is ^'^(^'"^-measurable. 

Let us go back to our entropy estimates. We have, by Lemma 2, 

ii{b)Hb{v''\v- V p^(^^'")) = H{v'\v-" yv-y rSiM,a)^ ^ 

Sep-" 

H{V^\V- V p^(^'«)) >rh-a> (P^^"") - 2a = 
f^{B)HB{Vn-2a. 

Because Hb{V\'P- V V^'-^'"^) < Hb{V) for every B, we deduce that with /i- 
tolerance v^2a for B G must hold 

HBiVlV- W^^^'''^) > HBiV) -V2^> HBiV) - e 

Combining this with the preceding arguments, with /x-tolerance K-^. + \/2a < f3 
for B € •p"" both the above entropy inequality holds, and we have the estimates 
of the measures of sets Ub- By the choice of ^, we obtain that with respect to 
Ub, is jointly ^-independent of the past and the modified return times i?^-* 
(fc e [l,ii']). Because iiiJIs) < K^fa < ^, this clearly implies /3-independence if 
each Rg^ is replaced by R^^ . □ 

To complete the proof of Theorem 1 it now remains to put the items together. 

Proof of Theorem 1. Fix an e > 0. On [0, oo), the functions 

9p{t) = min{l, j^(l - e^*) +pt}, 

where Cp ~ (1 — p)~p , decrease uniformly to 1 — e~* as p ^ 0+. So, let 5 be such 
that gs{t) < 1 — e~* + e for every t. We also assume that 



(1-2^)(1-^) > 1-e. 
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Let r be specified by Lemma 1, so tliat fiB(A) < S for every n > 1, every A G 
and for B e P~" with y^-tolerance 6. On the other hand, once r is fixed, the 
partition has at most {^VY elements, so with /xs-tolerance 5 iov A & V^, 
1^b{A) > S{^'P)~^. Let Ab be the subfamily of (depending on B) where this 
inequality holds. Let K be so large that for any p > 5{4I=P)~'^, 

oo 

^ Mi-p)'<i, 

and choose P < 6 so small that 

{K^ + K + 1)I3< f. 

The application of Lemma 3 now provides an ng such that for any n > riQ, with 
/x-tolerance /3 for B e "P"", the process induced on B generated by P'^ has the 
desired /3-indcpcndcncc properties involving the initial K return times of B. So, 
with tolerance 5 + P < 25 we have both, the above /?- independence and the estimate 
IJ-b{A) < 6 for every A e 'P'". Let Bn be the subfamily of P~" where these two 
conditions hold. Fix some n > no. 

Let us consider a cylinder set B n A G pI""-'') (or, equivalently, the block BA), 
where B e B„, A e Ab- The length of BA is n + r, which represents an arbitrary 
integer larger than no + r. Notice that the family of such sets BA covers more than 
{1 - 26){1 - 6) > I - e of the space. 

We will examine the distribution of the normalized first return time for BA. In 

( B) 

addition to our customary notations of return times, let R]^ be the first (absolute) 
return time of A in ((P|j)^, Hb, o'b), i-e., the variable defined on iM, counting the 
number of visits to B until the first return to BA. Let p = /xb(A) (recall, this is 
not smaller than d{#P)~'^). We have 



fc>i 

Uth 



FBA{t) = IJ.Ba{RbA <t} = IJ.Ba{RbA < JiX^A)} = 

The A;**^ term of this sum equals 

liiB{{Ak = A}n ^ A} n • • • n {Ai ^ A} n {Ao = A} n {4") < ^}), 

where Ai is the r-block following the i^^ copy of B (the counting starts from at 
the copy of B positioned at [— n, — 1]). 

By Lemma 3, for A; < i^, in this intersection of sets each term is /3-independent 

of the intersection right from it. So, proceeding from the left, we can replace the 
probabilities of the intersections by products of probabilities, allowing an error of 

(3. Note that the last term equals iib{Rb^ < ^} = Fb \^)- Jointly, the inaccuracy 
will not exceed {K + 1)^: 

HBA{R^r = k,Rf < -J^}-p{l-pf-'FP{l) < {K + 1)I3. 



Similarly, we also have 



< Kj3, hence the tail of the 



t,BA{RT ^k]-p{l-pf-^ 

series iJtBA{R^A^ = k} above K is smaller than K'^P plus the tail of the geometric 
series p(l—p)'^~-^, which, by the fact that p > 5{#P)~'^, is smaller than |. Therefore 

FBA{t)^J2p(^-P)"'"FB\l), 
k>l 
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up to {K^ + K+1)I3 + 



I < (5, uniformly for every t. By the application of Lemma 0, 



Gba satisfies 



GsAit) < min{l, - e^*) + St} < gs{t) < 1 - e* + e 



(because p < S). We have proved that for our choice of e and an arbitrary length 

TO > no + r, with /i-tolcrancc e for the cylinders C G 7-"", the intensity of repelling 
between visits to C is at most e. This concludes the proof of Theorem 1. □ 



This proof requires a number of technical ingredients, such as "semi-periodic 
markers" or short "transciently forbidden words" . The two facts below are standard 
exercises in ergodic theory and we only outline their proofs. 

Fact 3. In a process {V^, 11,(7) of positive entropy, where V is finite or countable, 
for each fc € N and e > there exist anl & N and k words wi,W2, ■ ■ ■ ,Wk of length I 

such that 

1. each Wi starts and ends with the same symbol a gV, independent from i 

2. each Wi has m,easure jjL at most 

3. for each i the set 



has positive measure ji. 

Proof. For 1. use recurrence in the fc-fold product system, and for 2. use the 
Shannon-McMillan-Breiman Theorem. Condition 3. follows easily from the high 

complexity in positive entropy. □ 

Fact 4. In every measure- preserving system {X, fi, T) of positive entropy h, for 
each sufficiently large r <E N there exists a "semiperiodic r-marker", i.e., a mea- 
surable set F such that the first return time Rp assumes only two values: r and 
r + 1. 

Proof. The system has a Bernoulli factor of entropy h. For large r the binary process 
obtained by random concatenations of two blocks, C'^^l and 0''1, is Bernoulli with 
entropy smaller than h, hence it is a factor of {X,ijl,T). The lift of the cylinder 
over 1 is the desired set F in X. □ 

We are in a position to present the proof of Theorem 2. 

Proof of Theorem 2. Fix e > 0, f > and N gN. Consider the following property 
of a (finite or countable) partition V: for every n G [N,N^], Fsit) < e with /z- 
tolerance e fov B £ P". (Recall that Fb denotes the distribution function of the 
normalized hitting time for B). It is easy to see that it holds on an open set £e,t,N 
of partitions (both in the space of partitions into at most to elements and in the 
space of at most countable partitions); for each n we can take the same finite sets 
of "good" n-cylinders B for the partitions in a neighborhood of V as for V. Of 
course, the set 



Proof of Theorem 2 



\U U 



j^i m=—l 




N>1 



of partitions such that the same property holds for some N, is also open. The main 
eff'ort in the proof will be to show that this set is also dense. Once this is done. 
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the proof is complete, because then the dense Gs set of partitions which reveal 
strong attracting can be obtained by intersecting the sets E^^t over countably many 
pairs (e, with e ^ and t ^ oo. Notice that for any infinite sequence of natural 
numbers N the set 1J[-^: -^^] has upper density 1 in N. 

In order to prove the density of fix a (finite or countable) partition V. Set 

fc = rfi + i, ^=^, M = 2kt. 

Choose words wi, W2, - . . , according to Fact 3. Let N be so large, that with 
/i-tolerance | in every AT-block, every word un occurs at least once so that it does 
not overlap with any other Wj (see condition 3. in Fact 3). Obviously, the same 
holds if N is replaced by any larger integer. For every n G [N, 7V^] we can thus 
select a finite collection of "good" n-blocks which satisfy the above and cover 1 — § 
of the space. Let p be so large, that < |, and that every good n-block (for any 
n e [N,N'^]) occurs at least M times in every, up to /u-tolerance 5, |-block. Let 
r = kp. 

Now we invoke the scmipcriodic r-marker set F of Fact 4. Every P-name can 
be divided at visits to F into a concatenation of r-blocks and (r + l)-blocks. For 
simplicity, we will call all of them component r -blocks. Every component r-block 
C will be further decomposed as a concatenation of k p-blocks C1C2 . . . Ck {Ck is 
either a p-block or a (p + l)-block, but again, for simplicity, we will cal all these 
blocks p-blocks). We fix a symbol h ^ a inV (recall that a denotes the first and 
last symbol of each Wi). Now wc modify the partition V by changing the P-names 
of points, as follows: In every P-name we replace, for every i, every occurrence of 
Wi within every i^^ p-block Cj of every component r-block C and within the first 
N"^ positions of the following p-block Cj+i (here fc + 1 = 1), by the word wq = bK 
Notice that there is no collision when overlapping words are replaced. 

JV^ AfS 
...\1w2^-W3.Wi.Wi...W2---W3.Wi...W2^'^W^^ 

V V 

JV^ JV^ 
...\'Aju2^-W3.Wo-Wo---W2---W3.Wo...W2-\'^U^ 

^ V V ' 

Figure 6: A V-name before and after modification. 

Let Cj' denote the right part of C, obtained by cutting off its left A''^ entries. 

First observe, that the change affects only a subset of (J^L^ Um=o (^i)' whose 
measure is smaller than e. Thus the distance between V and the partition V' after 
the modification is less than e. 

Notice also, that the modification completely forbids the word Wi within any Ci 
and A'^^ positions right from it, because all "old" occurrences are removed, and the 
insertions of the block wq do not create any overlapping "new" instances ofwi. On 
the other hand, these modifications do not affect inside C'i the words Wj with j 
which have not overlapped with Wi before the change. 

For fixed n S [N, N"^] and i G [1, k] observe an n-block B' (over the partition V) 
obtained from a "good" n-block B over V appearing inside some C-. Such blocks 
(with all possible values of i) still cover more than 1 — | — > 1 — e of the 
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space. Because, for each j ^ i, B' contains at least one unaffected copy of Wj (not 
overlapping with Wi before the change), B' cannot occur with its leftmost position 
located in any Cj except for j = i. On the other hand, inside C'^ it occurs as many 
times as B did before the change. Because the blocks jointly contain a fraction 

^ > ^^p- of all I -blocks, only a fraction of at most = i of all blocks C'^ 
may contain less than M copies of B'. Thus the measure of the cylinder B' 

(with respect to the partition V') is at least ^ = ^. The waiting time for B' is 
not larger than p only within Ci and the preceding p-block, so ij,{Vb' < p} < f < £• 
After normalizing, we obtain Fs'it) < e. We have proved that V' E £t,t.N- This 
completes the proof of the claim that E^^t is dense among the partitions, and ends 
the whole proof. □ 

For a more complete image of a process generated by a typical partition, let us 

formulate one more fact. 

Fact 5. Let {X,n,T) be an ergodic measure-preserving transformation of with pos- 
itive entropy. Fix some 2 < m < Hq. Then, in the Polsh space of all measurable 
partitions V of X into at most m elements, the set of partitions generating positive 
entropy is open and dense. 

Proof. It is known that entropy is continuous in the Rokhlin metric, so positive 
entropy is an open property (sec e.g. [P]). To obtain density it suffices to perturb 
a zero-entropy partition by a small set not measurable with respect to the Pinsker 
algebra. □ 

Consequences for limit laws 

The studies of limit laws for return/hitting time statistics arc based on the 
following approach: For x £ define Fx,n = Fb (and Fx,n = Fb), where B is 
the block x[0,n) (or the cylinder in V" containing x). Because for nondecreasing 
functions F : [0, oo) [0, 1], the weak convergence coincides with the convergence 
at continuity points, and it makes the space of such functions metric and compact, 
for every x there exists a well defined collection of limit distributions for F^^n (and 
for Fx^n) as n ^ 00. They are called limit laws for the hitting (return) times at 
X. Due to the integral relation (Fb ~ Gs) a sequence of return time distributions 
converges weakly if and only if the corresponding hitting time distributions converge 
pointwise (see [H-L-V]), so the limit laws for the return times completely determine 
those for hitting times and vice versa. A limit law is essential if it appears along 
some subsequence (rifc) for x's in a set of positive measure. In particular, the 
strongest situation occurs when there exists an almost sure limit law along the full 
sequence (n) . In such case the process is said to have exponential asymptotics. Most 
of the results concerning the limit laws, obtained so far, can be classified in three 
major groups: 

a) characterizations of possible essential limit laws for specific zero entropy pro- 
cesses (e.g. [D-M]. [C-K]; these limit laws are usually atomic for return times or 

picccwisc linear for hitting times), 

b) finding classes of processes with exponential asymptotics (e.g. [A-G], [H-S-V]), 
and 

c) results concerning non-essential limit laws, limit laws along sets other than 
cylinders (see [L] ; every probabilistic distribution with expected value not exceed- 
ing 1 can occur in any process as the limit law for such general return times), or 
other very specific topics. 
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As a consequence of our Theorem 1, wc obtain, for the first time, a serious bound 
on the possible essential limit laws for the hitting time statistics along cylinders in 
the general class of crgodic positive entropy processes. The statement (1) below 
is even slightly stronger, because we require, for a subsequence, convergence on a 
positive measure set, but not necessarily to a common limit. 

Theorem 3. Assume ergodicity and positive entropy of the process (P^,/!, cr). 

(1) // a subsequence (uk) is such that F^^nk converge pointwise to some limit 
laws Fx on a positive measure set A of points x, then almost surely on A, 
Fx{t) < 1 - e"* at each t > 0. 

(2) If {uk) grows sufficiently fast, then there is a full measure set, such that for 
every x in this set holds: limsup^, Fx,nk{t) < 1 — e~* at each t>Q. 

Proof. The implication from Theorem 1 to Theorem 3 is obvious and we leave it to 
the reader. For (2) we hint that [uk) must grow fast enough to ensure summability 
of the measures of the sets where the intensity of repelling persists, then the Borel- 

Cantelli Lemma applies. □ 

Our Theorem 2 (again combined with the Borel-Cantelli Lemma) shows that 
a typical positive entropy process (including Bernoulli processes) admits the zero 
function as an essential limit law for the distributions of the waiting time. In 
particular, not all Bernoulli processes have exponential asymptotics. 

An example 

It is important not to be misled by an oversimplified approach to Theorem 1. 
The "decay of repelling" in positive entropy processes appears to agree with the 
intuitive understanding of entropy as chaos: repelling is a "self-organizing" prop- 
erty; it leads to a more uniform, hence less chaotic, distribution of an event along 
a typical orbit. Thus one might expect that repelling with intensity e revealed by 
a fraction ^ of all n-blocks contributes to lowering an upper estimate of the en- 
tropy by some percentage proportional to ^ and depending increasingly on e. If 
this happens for infinitely many lengths n with the same parameters ^ and e, the 
entropy should be driven to zero by a geometric progression. Surprisingly, it is not 
quite so, and the phenomenon has more subtle grounds. We will present an exam- 
ple which exhibits the incorrectness of such intuition. Note also that in the proof 
of Theorem 1 the entropy is "killed completely in one step" , that means, positive 
entropy and persistent repelling lead to a contradiction by examining the blocks 
of one sufficiently large length n; we do not use any iterated procedure requiring 
repelling for infinitely many lengths. 

The construction below will show that for each 5 > and n G N there exists 
N & N and an ergodic process on N symbols with entropy logj N — 6, such that 
the n-blocks from a collection of joint measure equal to ^ repel with nearly the 
maximal possible intensity e~^. Because S can be extremely small compared to ^, 
this construction illustrates, that there is no "reduction of entropy" by an amount 
proportional to the fraction of blocks which reveal strong repelling. 

Example 1. Let V be an alphabet of a large cardinality N. Divide V into two 

disjoint subsets, one, denoted Vo, of cardinality A^'o = N2~^ and the relatively small 
(but still very large) rest which we denote by {1, 2, . . . , r} (we will refer to these 
symbols as "markers"). For i = 1,2, ... ,r, let Bj be the collection of all n-blocks 
whose first n — 1 symbols belong to Vo and the terminal symbol is the marker 
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i. The cardinality of Bi is Nq~^. Let Ci be the collection of all blocks of length 
nNQ~^ obtained as concatenations of blocks from Bi using each of them exactly 
once. The cardinality of Ci is {Nq^^)\. Let X be the subshift whose points are 
infinite concatenations of blocks from IJ^^j^Ci, in which every block belonging to 
Ci is followed by a block from Cj+i (1 < i < r) and every block belonging to Cr 
is followed by a block from Ci. Let /U be the shift-invariant measure of maximal 
entropy on X. It is immediate to see that the entropy of fj, is i_i log2((A^o ~^)!), 

niVQ 

which, for large TV, nearly equals log2 A'o = logg N — S. Finally observe that the 
measure of each B ^ Bi equals ^^^n-i ■ the joint measure of ^Jl^iBi is exactly 

— , and every block B from this family appears in any x & X with gaps ranging 

between -jjj^ and -jjj^, revealing strong repelling. 

Remark 1. Viewing the blocks of length nrA^g starting with a block from Ci as 
a new alphabet, and repeating the above construction inductively, we can produce 
an example (with the measure of maximal entropy on the intersection of systems 
created in consecutive steps) with entropy log2 N — 25, in which the strong repelling 
will occur with probability — for infinitely many lengths n/j. 

Remark 2. The process described in the above remark is (somewhat coincidently; it 

was not designed for that) bilaterally deterministic: for every m G N the sigma-field 
•p(-oo,~m]u[m,oo) gq^^jg i^j^g (ppoduct) sigma-field. Indeed, suppose we see all 
entries of a V-name of a point x except on the interval (— m, m). In a typical point, 
this interval is contained between a pair of successive markers i for some level k of 
the inductive construction. Then, by examining this name's entries far enough to 
the left and right we will see complete all but one (the one covering the coordinate 
zero) blocks from the family Bi which constitute the block C £ Ci covering the 
considered interval. Because every block from Bi is used in C exactly once, by 
elimination, we will be able to determine the missing block from Bi and hence all 
symbols in (— m, m). 

Questions 

Question 1. Is there a speed of the convergence to zero of the joint measure of the 

"bad" blocks in Theorem 1? More precisely, does there exist a positive function 
s(n, e, #7^) converging to zero as n grows, such that if for some e and infinitely 
many n's, the joint measure of the n-blocks which repel with intensity e exceeds 
s(n, e, #P), then the process has necessarily entropy zero? (By the Example 1, ^ 

is not enough.) 

Question 2. Can one strengthen the Theorem 3 as follows: 

limsup Fx,n < 1 — e~* /x-almost everywhere? 

n— *oo 

Question 3. In Lemma 3, can one obtain P'' conditionally /3-independent of jointly 
the past and all return times i?^^ {k > 1) (for sufficiently large n, with /u-tolerance 
j3 ioT B £ V~")7 In other words, can the /3-indcpcndent process {{Vg)^, ij,b,o'b) 
be obtained /^-independent of the factor-process generated by the partition into B 
and its complement? 

Question 4- (suggested by J-P. Thouvenot) Find a purely combinatorial proof of 
Theorem 1, by counting the quantity of very long strings (of length m) inside 
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which a positive fraction (in measure) of aU n-blocks repel with a fixed intensity. 
For sufficiently large n this quantity should be eventually (as m — > oo) smaller than 
h"^ for any preassigned positive h. 
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